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Abstract: We consider lossy source coding when side information affecting the distortion mea- 
sure may be available at the encoder, decoder, both, or neither. For example, such distortion 
side information can model reliabilities for noisy measurements, sensor calibration information, 
or perceptual effects like masking and sensitivity to context. When the distortion side informa- 
tion is statistically independent of the source, we show that in many cases {e.g., for additive or 
multiplicative distortion side information) there is no penalty for knowing the side information 
only at the encoder, and there is no advantage to knowing it at the decoder. Furthermore, 
for quadratic distortion measures scaled by the distortion side information, we evaluate the 
penalty for lack of encoder knowledge and show that it can be arbitrarily large. In this scenario, 
we also sketch transform based quantizers constructions which efficiently exploit encoder side 
information in the high-resolution limit. 

1 Introduction 

In many large systems such as sensor networks, communication networks, and biolog- 
ical systems different parts of the system may each have limited or imperfect informa- 
tion but must somehow cooperate. Key issues in such scenarios include the penalty 
incurred due to the lack of shared information, possible approaches for combining 
information from different sources, and the more general question of how different 
kinds of information can be partitioned based on the role of each system component. 

One example of this scenario is when an observer records a signal x to be conveyed 
to a receiver who also has some additional signal side information w which is correlated 
with X. As demonstrated by various researchers, in many cases the observer and 
receiver can obtain the full benefit of the signal side information even if it is known 
only by the receiver PP j2] 0- 

In this paper we consider a different scenario where instead the observer has 
some distortion side information q which describes what components of the data are 
more sensitive to distortion than others, but the receiver may not have access to 
q. Specifically, let us model the differing importance of different signal components 
by measuring the distortion between the zth source sample, x [i] , and its quantized 

*This work was conducted in part while R. Zamir was visiting the Digital Signal Processing 
Group at MIT. 



value, x[i], by a distortion function which depends on the side information q[i]: 
d{x [i],x[^,q\i]). 

In principle, one could treat the source-side information pair (q,x) as an "effec- 
tive composite source", and apply conventional techniques to quantize it. Such an 
approach, however, ignores the different effect q and x have on the distortion. And 
as often happens in lossy compression, good understanding of the distortion measure 
may lead to better designs. 

For example, a sensor may have side information corresponding to reliability es- 
timates for measured data (which may or may not be available at the receiver). This 
may occur if the sensor can calibrate its accuracy to changing conditions {e.g., the 
amount of light, background noise, or other interference present), if the sensor aver- 
ages data for a variety of measurements {e.g., combining results from a number of 
sub-sensors) or if some external signal indicates important events {e.g., an accelerom- 
eter indicating movement). 

Alternatively, certain components of the signal may be more or less sensitive to 
distortion due to masking effects or context jl] . For example errors in audio samples 
following a loud sound, or errors in pixels spatially or temporally near bright spots 
are perceptually less relevant. Similarly, accurately preserving certain edges or tex- 
tures in an image or human voices in audio may be more important than preserving 
background patterns/sounds. Masking, sensitivity to context, etc., is usually a com- 
plicated function of the entire signal. Yet often there is no need to explicitly convey 
information about this function to the encoder. Hence, from the point of view of 
quantizing a given sample, it is reasonable to model such effects as side information. 

Clearly in performing data compression with distortion side information, the en- 
coder should weight matching the more important data more than matching the less 
important data. The importance of exploiting the different sensitivities of the human 
perceptual system are widely recognized by engineers involved in the construction 
and evaluation of practical compression algorithms when distortion side information 
is available at both observer and receiver. In contrast, the value and use of distor- 
tion side information known only at either the encoder or decoder but not both has 
received relatively little attention in the information theory and quantizer design com- 
munity. The rate-distortion function with decoder-only side information, relative to 
side information dependent distortion measures (as an extension of the Wyner-Ziv 
setting [3J, is given in |2]. A high resolution approximation for this rate-distortion 
function for locally quadratic weighted distortion measures is given in 

We are not aware of an information-theoretic treatment of encoder-only side infor- 
mation with such distortion measures. In fact, the mistaken notion that encoder only 
side information is never useful is common folklore. This may be due to a misunder- 
standing of Berger's result that side information which does not affect the distortion 
measure is never useful when known only at the encoder [H]. 

In this paper we study the rate-distortion trade-off when side information about 
the distortion sensitivity is available. We show that such distortion side information 
can provide an arbitrarily large advantage (relative to no side information) even when 
the distortion side information is known only at the encoder. Furthermore, we show 
that just as knowledge of signal side information is often only required at the decoder. 



knowledge of distortion side information is often only required at the encoder. Be- 
yond the theoretical results, these observations serve as a useful guide for designing 
quantizers with distortion side information. 

We first illustrate how distortion side information can be used even when known 
only by the observer with some examples in Section |21 Next, in Section El we pre- 
cisely define a problem model and state the relevant rate-distortion trade-offs. In 
Section 01 we present our main results characterizing when knowledge of distortion 
side information is sufficient at only the encoder and sketch one practical construction. 

2 Examples 

2.1 Discrete Uniform Source 

Consider the case where the source, x[i], corresponds to n samples each uniformly 
and independently drawn from the finite alphabet X with cardinality \X\ > n. Let 
q [i] correspond to n binary variables indicating which source samples are relevant. 
Specifically, let the distortion measure be of the form d{q, x,x) = if and only if 
either q = or x = x. Finally, let the sequence q [i] be statistically independent of 
the source with q [i] drawn uniformly from the n choose k subsets with exactly k ones. 

If the side information were unavailable or ignored, then losslessly communicating 
the source would require exactly n ■ log \ X\ bits. A better (though still sub-optimal) 
approach when encoder side information is available would be for the encoder to first 
tell the decoder which samples are relevant and then send only those samples. This 
would require n ■ Hi,{k/n) + k ■ log\X\ bits where Hii{-) denotes the binary entropy 
function. Note that if the side information were also known at the decoder, then the 
overhead required in telling the decoder which samples are relevant could be avoided 
and the total rate required would only be k - log \ We will show that this overhead 
can in fact be avoided even without decoder side information. 

Pretend that the source samples x [0], x [1], . . ., x — 1], are a codeword of an 
(ra, k) Reed-Solomon (RS) code (or more generally any MDS^ code) with q[i] = 
indicating an erasure at sample i. Use the RS decoding algorithm to "correct" the 
erasures and determine the k corresponding information symbols which are sent to 
the receiver. To reconstruct the signal, the receiver encodes the k information symbols 
using the encoder for the (n, k) RS code to produce the reconstruction x [0], x [1], . . ., 
X [n — 1]. Only symbols with g [i] = could have changed, hence x[i] = x [i] whenever 
g [i] = 1 and the relevant samples are losslessly communicated using only k ■ log \ X\ 
bits. 

As illustrated in Fig. HI RS decoding can be viewed as curve-fitting and RS en- 
coding can be viewed as interpolation. Hence this source coding approach can be 
viewed as fitting a curve of degree — 1 to the points of x [i] where q[i] = 1. The 
resulting curve can be specified using just k elements. It perfectly reproduces x [i] 
where g [i] = 1 and interpolates the remaining points. 

^The desired MDS code always exists since we assumed \X\ > n. For \X\ < n, near MDS codes 
exist which give asymptotically similar performance with an overhead that goes to zero as n ^ oo. 



Figure 1: Losslessly encoding a source with n = 7 points where only k = 5 points are 
relevant {i.e., the unshaded ones), can be done by fitting a fourth degree curve to the 
relevant points. The resulting curve will require k elements (yielding a compression 
ratio of k/n) and will exactly reproduce the desired points. 

2.2 Gaussian Source 

A similar approach can be used to quantize a zero mean, unit variance, complex 
Gaussian source relative to quadratic distortion using the Discrete Fourier Transform 
(DFT). Specifically, to encode the source samples x[0], x [1], . . ., x [n — 1], pretend 
that they are samples of a complex, periodic, Gaussian, sequence with period n, which 
is band-hmited in the sense that only its first k DFT coefficients are non-zero. Using 
periodic, band-limited, interpolation we can use only the k samples for which c/ [i] = 1 
to find the corresponding k DFT coefficients, X [0], X [1], . . ., X [A; — 1]. 

The relationship between the k relevant source samples and the k interpolated 
DFT coefficients has a number of special properties. In particular this k x k trans- 
formation is unitary. Hence, the DFT coefficients are Gaussian with unit variance 
and zero mean. Thus, the k DFT coefficients can be quantized with average distor- 
tion D per coefficient and k ■ R{D) bits where R{D) represents the rate-distortion 
trade-off for the quantizer. To reconstruct the signal, the decoder simply transforms 
the quantized DFT coefficients back to the time domain. Since the DFT coefficients 
and the relevant source samples are related by a unitary transformation, the average 
error per coefficient for these source samples is exactly D. 

Note if the side information were unavailable or ignored, then at least n- R{D) bits 
would be required. If the side information were losslessly sent to the decoder, then 
n ■ H^{k/n) + k ■ R{D) would be required. Finally, even if the decoder had knowledge 
of the side information, at least k ■ R{D) bits would be needed. Hence, the DFT 
scheme achieves the same performance as when the side information is available at 
both the encoder and decoder, and is strictly better than ignoring the side information 
or losslessly communicating it. 

3 Problem Model 

Vectors and sequences are denoted in bold {e.g., x) with the ith element denoted as 
X [i]. Random variables are denoted using the sans serif font {e.g., x) while random 
vectors and sequences are denoted with bold sans serif {e.g., x). We denote mutual 
information, entropy, and expectation as I{x;y), H{x), E[x]. Calligraphic letters 
denote sets {e.g., x & X). 

We are primarily interested in a particular type of side information (which we call 



"distortion side information" ) that is statistically independent of the source but affects 
the distortion measure. Specifically, we consider the source coding with distortion side 
information problem defined as the tuple 

{X,X,Q,p,{x),p,{q),d{-, ■,■)). (1) 

A source x consists of the n samples x [1], x [2], . . ., x[n] drawn from the alphabet 
X . The distortion side information q likewise consists of n samples drawn from the 
alphabet Q. These random variables are generated according to the distribution 

n 

Px,q(x, q) = JJpx(a; [i]) ■ Pq{q [i]). 
1=1 

A rate R encoder, /(■), maps a source as well as possible side information to 
an index i G {1, 2, . . . , 2"^}. The corresponding decoder, g{-), maps the resulting 
index as well as possible decoder side information to a reconstruction of the source. 
Distortion for a source x which is quantized and reconstructed to the sequence x 
taking values in the alphabet X is measured via 

1 " 

ci(x, X, q) = - ^ d{x [i\ ,x\i],q[i\). (2) 

i=l 

As usual, the rate-distortion function is the minimum rate such that there exists a 
system where the distortion is at most D with probability approaching 1 as n ^ oo. 

The four scenarios where q is available at the encoder, decoder, both, or neither 
are illustrated in Fig. |21 along with the symbol denoting each rate-distortion function. 

Proposition 1. The rate- distortion functions for the scenarios in Fig. are 

-Rnone(-D) = inf /(x;x) (3a) 

Px\>,{S:\x):E[d{x,x,q)]<D 

Rbec{D)= inf I{x;u)-I{u;q) (3b) 

RENciD) = inf /(x, q; x) = /(x; x|cjf) + /(x; q) (3c) 

Px\x,qi^\^<g)-Eld{x,x,q)]<D 

Rbotii{D)= inf /(x;x|g). (3d) 

Px\x,q{^\^,Q)-E[d{x,x,q)]<D 



The rate-distortion functions in (j3a|) . ()3b|). and ()3d|) follow from standard results 
(e. 5'., II] III 0)- To obtain (jHc)) we can apply the classical rate-distortion 
theorem to the "super source" x' = (x, q) as suggested by Berger |H|. In the sequel 
we characterize the penalty or rate-loss incurred by having side information available 
only at the encoder, only at the decoder, or neither compared to full side information. 



4 Main Results 



A system with encoder only side information corresponds to a system with a fixed 
codebook but a variable partition which depends upon q.^ As an almost trivial 

^This structure also appears in the study of robust codebooks [Hj. 
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Figure 2: Possible scenarios and rate-distortion functions with distortion side infor- 
mation at the decoder (b), encoder (c), both (d), and neither (a). The terms in (b) 
and (d) are also known as the Wyner-Ziv and conditional rate-distortion functions. 



example, consider an encoder which observes x = z + q where z represents the true 
signal and q represents observation noise, i.e., d{z — x) = d{x — q — x) = d'{x, x, q). By 
shifting the partition by q to quantize x— q, the encoder achieves optimal performance. 
By contrast, systems with decoder only side information correspond to fixed partitions 
with variable codebooks and often can not exploit distortion side information as easily. 
In the following, we make these notions precise for more general distortion measures. 

4.1 Rate-Distortion Trade-OfFs 

We begin with the following theorems (proved in Appendix El) which show when side 
information at the encoder can be optimally used even though such side information 
may be useless if known only at the decoder. 

Theorem 1. Let distortion side information q be statistically independent of the 
source x and let x be uniformly distributed over a group with distortion measured 
via d{x, X, q) = d{x Q x, q) where Q represents a binary group operation. Then the 
rate- distortion function when q is available at the encoder is the same as when it is 
available at both encoder and decoder, i.e., i?ENc(-D) = -Rboth(-D). 

To state a similar result for continuous sources, we require various technical con- 
ditions describing a "smooth" source and distortion measure. Essentially, all that 
is required is that the source have a density and finite differential entropy and that 
an entropy maximizing distribution exists for the distortion measure of interest. For 
example, any vector source and distortion measure with 

— oo < h{x) < oo and E[\ \x\ < oo and d{x, x,q) = ag + Pg ■ \ \x — x\\'^'' Wq (4) 



will satisfy the required conditions provided a^, Pq, •jq are non-negative. See jTU] or 
jTI] for a more detailed discussion of the necessary technical conditions. 

Theorem 2. Let q be statistically independent of the source x and consider any 
"smooth" source and distortion measure satisfying the conditions in 11 (J\ Theorem 
1] for each q & Q. Then the rate- distortion function when q is available only at the 
encoder is asymptotically the same as when it is available at both encoder and decoder, 
i.e., limD^D^.^^RENc{D) - Rbotu{D) = 0.^ 

Finally, in addition to the previous theorems showing when only the encoder 
requires q, we have the following result stating when q is useless to the decoder. 

Theorem 3. Let the distortion side information q be statistically independent of 
the source x and consider scaled distortion measures of the form d{x,x,q) = dolq) ■ 
di{x,x). Then the rate- distortion function for q available at the decoder is the same 
as when q is available at neither encoder nor decoder, i.e., _Rdec(-D) = -Rnone(-D). 

Combining our results shows that in many cases knowledge of q is optimal at the 
encoder and useless at the decoder. 

Corollary 1. For sources and side information weighted difference distortion mea- 
sures satisfying the conditions in Theorems Ql and (or respectively in Theorems 
and\^, Renc{D) — -Rboth(-D) = and i?NONE(-D) — -Rdec(-D) = 0, or respectively, 
limD^D^.^ i?ENc(^) - RBOTaiD) = and lim^^D^i^ i?NONE(^) - Rdec{D) = 0. 

4.2 The Penalty for Lack of Encoder Knowledge 

Consider generalizing the commonly used quadratic distortion model by scaling the 
distortion as a function of the side information as in 0. Specifically, let d{q,x,x) = 
q ■ {x — x)^. For this scenario, fS"^ implies that -Rboth(-D) = h{x) — (1/2) \n{2TreD) + 
{l/2)E[lnq] while Rdec{D) = h{x) - {l/2)\n{2neD) + (1/2) In E[q]. Combining this 
with Corollary^ shows that the asymptotic penalty for lack of encoder knowledge of q 
is (1/2) ■ (ln£'[g] — i?[ln q]) nats per sample. Table [T] evaluates this penalty for various 
distributions of q. Note that in many cases, the rate loss can be made arbitrarily large 
by choosing the appropriate shape parameter to place more probability near q = 0. 
Intuitively, this occurs because when (? ~ 0, the informed encoder can transmit almost 
zero rate while the uninformed encoder must transmit a large rate to achieve high 
resolution. Furthermore, all but one of these distributions would require infinite rate 
to losslessly communicate the side information. 

4.3 Quantizer Design 

As discussed in Section for distortion side information indicating that a given source 
sample is relevant or completely irrelevant, a transform followed by a scalar quantizer^ 

■^Usually -Dmin = 0, but to allow for more general distortion measures we define -Dmin as the 
minimum achievable distortion when arbitrarily high rates are allowed. 

■^Entropy coding the scalar quantizers output is also possible without changing this result. 



Table 1: The rate-penalty (in nats) for not knowing side-information with the given 
distribution at the encoder. Euler's constant is denoted by 7. 
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efficiently exploits encoder side information. To generalize this transform coding 
construction, consider two-level side information with the alphabet Q = {qq, qi} where 
<?! > > and distortion is measured via d{q,x,x) = q- (x — x)^. Furthermore, let a 
random k out of n samples of q take the value qi while the other n — k samples take 
the value go- If ^ is known at both encoder and decoder then the optimal strategy 
is to use a rate Rq quantizer for samples when q = qo and a rate _Ri > _Ro quantizer 
when q = qi such that the overall rate or distortion constraint is satisfied. 

To asymptotically achieve the same performance via transform coding when q is 
known only at the encoder, we can use the following procedure. First, quantize the k 
more important source samples where q [i] = qi with a rate Rq quantizer to produce 
Xi. Define the first stage error signal as e [i] = x [i] — xi[i] where we assume xi[i] = 
when q [i] = go since these less important samples have not yet been quantized. Next 
use band-limited interpolation to find the k DFT coefficients E [i] such that the IDFT 
of E [i] accurately reproduces e [i] when q [i] = gi. Quantize these coefficients using a 
rate Ri — Rq quantizer. Define the second stage error signal as e' [i] = x[i]— xi [i] — e [i] 
where e [i\ represents the IDFT of the quantized E Finally, quantize the n — k 
samples of e' [i] where q [i] = go using a rate Rq quantizer to produce X2. 

The receiver obtains the reconstruction x [i] = xi [i] -I-X2 [i] + e [i] consisting of a rate 
Rq scalar quantization of each source sample and a quantized shift e [i] . Although 
the receiver can not deduce from e [i] which samples were more important, e [i] was 
chosen by the encoder to make the quantization of the more important samples more 
accurate. As illustrated in Fig. El for {n, k) = (2, 1), this type of system corresponds 
to a quantization lattice where the encoder can choose the partition to shape the 
error based on the side information. It is possible to show that in high resolution this 
system approaches the performance of a fully informed system {i.e., using a rate Rq 
quantizer when q [i] = go and a rate Ri quantizer when q [i] = gi) [llj. Conceptually, 
in the high resolution limit, edge effects become negligible and the shape of each cell 
in Fig. El approaches a rectangle. This system specializes to the one in Section 12.21 



when qo = and can be further generahzed to larger side information alphabets jllj . 




Figure 3: The quantization points and possible partitions for a transform coder. If the 
encoder knows the horizontal error (respectively, vertical error) is more important, 
it can use the partition on the left to increase horizontal accuracy (resp., vertical 
accuracy). The decoder only needs to know the quantization point not the partition. 

A Proofs 

Proof of Theorem^}; For a finite group, choosing z* to maximize H{z\q) subject to 
the constraint E[d{z, q)] < D yields the following lower bound on _Renc(-D): 



/(x; X, q) = H{x) + H{q) - H{x, q\x) (5) 

= log \X\ + H{q) - H{q\x) - H{x - x|x, q) (6) 

>log|A'|-iJ(x-x|cif) (7) 

>\og\X\- H{z*\q) (8) 



where ((Tj) follows since conditioning reduces entropy. Choosing the test-channel dis- 
tribution X = z* + x achieves this bound with equality and must therefore be optimal. 
Furthermore, since x and q are statistically independent for this test-channel distribu- 
tion, /(x; c?) = and thus comparing (jHcjl and to (|3dj) shows i?ENc(-D) = -Rboth(-D) 
for finite groups. The same argument holds for continuous groups with entropy re- 
placed by differential entropy and 1^*1 replaced by the Lebesgue measure of X. For 
more general groups {e.g., mixed groups with both discrete and continuous compo- 
nents), a more complicated convexity argument is required ^I]. □ 

Proof Sketch For Theorem\^ Due to space constraints we only sketch the ideas be- 
hind the proof. As for Theorem [TJ we can develop a lower bound for i?BOTH(-D) using 
an entropy maximizing distribution and the Shannon lower bound ^U]. Then by us- 
ing the resulting test-channel distribution for i?ENc(-D) we can show that /(x; q) goes 
to zero in the high resolution limit and therefore Renc{D) Rbotu{D) . □ 



Proof of Theorem\^ When side information is available only at the decoder, Wyner- 
Ziv coding is optimal jS]. First we compute the optimal reconstruction function f (-, ■): 

v{u, q) = argmin E[d{x, x, q)\q = q, u = u] (9) 

X 

= aYgmmdo{q)E[di{x, x)\q = q, u = u] (10) 

X 

= aigmin E[di{x, x)\q = q, u = u] (11) 

X 

= argmin_E'[(ii(x, x)|ty = m] (12) 

X 

where ()10|) follows by the assumption that we have a separable distortion measure 
and (fT^ follows because q is statistically independent of x (by assumption) and 
independent of u (since u is generated at the encoder from x). Thus since neither the 
optimal reconstruction function, v{-, ■) nor the auxiliary random variable, u, depend 
on q, knowing q at only the decoder provides no advantage. □ 
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